Visualisation of heterogeneous data with simultaneous feature saliency using Generalised Generative Topographic Mapping

نویسندگان

  • Shahzad Mumtaz
  • Michel F. Randrianandrasana
  • Gurjinder Bassi
  • Ian T. Nabney
چکیده

Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visualisation of Heterogeneous Data with the Generalised Generative Topographic Mapping

Heterogeneous and incomplete datasets are common in many real-world applications. The probabilistic nature of the Generative Topographic Mapping (GTM), which only handles complete continuous data originally, offers the ability to extend it to also visualise mixed-type and missing data as suggested in (Bishop et al., 1998a). This paper describes this generalisation of GTM and assesses the result...

متن کامل

Novel Visualisation Methods for Protein Data

Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very highdimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) al...

متن کامل

Topology-Preserving Mappings for Data Visualisation

We present a family of topology preserving mappings similar to the Self-Organizing Map (SOM) and the Generative Topographic Map (GTM) . These techniques can be considered as a non-linear projection from input or data space to the output or latent space (usually 2D or 3D), plus a clustering technique, that updates the centres. A common frame based on the GTM structure can be used with different ...

متن کامل

Visualisation of tree-structured data through generative probabilistic modelling

We present a generative probabilistic model for the topographic mapping of tree structured data. The model is formulated as constrained mixture of hidden Markov tree models. A natural measure of likelihood arises as a cost function that guides the model fitting. We compare our approach with an existing neural-based methodology for constructing topographic maps of directed acyclic graphs. We arg...

متن کامل

Preliminary theoretical results on a feature relevance determination method for Generative Topographic Mapping

Feature selection (FS) has long been studied in classification and regression problems, following diverse approaches and resulting on a wide variety of methods, usually grouped as either filters or wrappers. In comparison, FS for unsupervised learning has received far less attention. For many real problems concerning unsupervised multivariate data clustering, FS becomes an issue of paramount im...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015